STATISTICAL FACIAL FEATURE EXTRACTION METHOD 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

The present invention relates to a statistical facial feature extraction 
5 method, which uses principle component analysis (PCA) to extract facial 
features from images. 

2. Description of Related Art 

With the development of information technology continuously, 
more and more corresponding applications are introduced into our daily 

10 lives for improvement. Especially, the use of effective human-computer 
interactions makes our lives more convenient and efficient. With recent 
dramatic decrease in video and image acquisition cost, computer vision 
systems can be extensively deployed in desktop and embedded systems. For 
example, an ATM machine can identify users by the images captured from 

1 5 the camera equipped on it, or the video-based access control systems can 
give the access permission by recognizing captured face images. 

Among all the interfaces between humans and computers, a human 
face is commonly regarded as one of the most efficient media since it 
carries enormous information (i.e., many facial features like eyes, nose, 

20 nostrils, eyebrow, mouth, lip,.., etc.), and is most visually discriminative 
among individuals. Therefore, facial images of individuals can be 
recognized easier than other kinds of images. 

Two typical techniques for facial feature extraction are used: one 
parameterized model method for describing the facial features based on the 



energy-minimized values, and the other eigenrimage method for detecting 
facial features. 

The former method uses deformable templates to extract desired 
facial features to change the properties such as size and shape, to match the 
5 model to the image and thus obtain more precise description to the facial 
features. The execution phase uses peak, valley, and edge images as 
representatives to highlight the salient feature in an image data, and an 
energy minimization function to alter deformable templates in the image 
data. The deformable templates are parameterized models for describing 

10 the facial features, such as eyes or mouth. Parameter settings can alter the 
position, orientation, size and other properties of the templates. In addition, 
an automatic feature detection and age classification system for human face 
images have developed in the prior art. They represent the shape of eyes or 
face contour by parametric curves (for example, combination of parabola 

15 curves or ovals). Next, an energy function is defined for each facial feature 
based on its intensity property. For example, a valley can describe the 
possible location of an iris. 

However, the cited method is based on finding the best deformable 
model capable of minimizing an energy function having the property of the 

20 particular facial feature of interest, so deformable model used by the 
minimization process usually needs a proper initial guess value to help for 
computing required convergence. 

In the other eigen-image method for detecting facial features, a face 
recognition system is applied to localize desired head and eyes from images 



in the basis of principal component analysis (PCA) algorithm. For the 
detection of eyes, typical eigen-eye images are constructed from the basis of 
eye feature images. To speed up the computational cost, the correlation 
between an input image and the eigen-template image is computed by Fast 
5 Fourier Transform (FFT) algorithm. However, the cited method uses a 
separate template for comparison, which can only find an individual 
difference. For example, using a left eye feature image can extract only the 
corresponding left eye location from a facial image, but cannot detect 
complete features of a whole face image and is not easy to be matched to 
10 statistical models. 

Therefore, it is desirable to provide an improved facial feature 
extraction method to mitigate and/or obviate the aforementioned problems. 
SUMMARY OF THE INVENTION 

An object of the present invention is to provide a statistical facial 
15 feature extraction method, which is based on principal component analysis 
(PCA) technique to ftirther accurately describe the appearance and 
geometric variations of facial features. 

Another object of the present invention is to provide a statistical 
facial feature extraction method, which can combine the statistical 
20 information on geometric feature distribution and photometric feature 
appearance obtained in a facial feature training phase, thereby extracting 
complete facial features from face images. 

A frirther object of the present invention is to provide a statistical 
facial feature extraction method, which does not need a proper initial guess 



value because only candidate feature positions (shapes) are required to be 
found in candidate search ranges of each facial feature, as based on face 
images completely detected by a face detection method, thereby reducing 
system load. 

5 To achieve the object, the statistical facial feature extraction method 

of the present invention comprises a first procedure and a second procedure. 
The first procedure creates a statistical face shape model based on a 
plurality of training face images. This is achieved by selecting N training 
face images and respectively labeling feature points located in n different 

10 blocks for the training face images to define corresponding shape vectors of 
the training face images; aligning each shape vector with a reference shape 
vector after the shapes for all the face images in the training data set are 
labeled; and using a principal component analysis (PCA) process to 
compute a plurality of principal components based on the aligned shape 

15 vectors and thus forming the statistical face shape model, wherein the shape 
vectors are represented by a statistical face shape with conjunction to a 
plurality of projection coefficients. 

The second procedure extracts a plurality of facial features from a 
test face image. This is achieved by selecting a test face image; guessing n 

20 initial positions of n test feature points, wherein the initial positions are 
located in the test face image and each initial position is represented by a 
mean value of the n feature points of the aligned shape vectors; defining n 
search ranges in the test face image, based on the initial positions, wherein 
the search ranges correspond to different blocks, respectively; labeling a 



plurality of candidate feature points for each search range; doing 
combination of the candidate feature points in different search ranges to 
form a plurality of test shape vectors; and matching each shape vector to the 
mean value and principle components in order to compute a similarity, 
5 wherein one, having the best similarity, of the test shape vectors, 
corresponds to candidate feature points to be assigned as facial features of 
the test face image. 

Other objects, advantages, and novel features of the invention will 
become more apparent from the following detailed description when taken 
10 in conjunction with the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a flowchart of an embodiment of the present invention; 

FIG. 2 is a schematic diagram of training face images according to 
the embodiment of the present invention; 
15 FIG 3 is a schematic diagram of labeled feature points of FIG. 2 

according to the embodiment of the present invention; 

FIG. 4 is a flowchart illustrating a process of aligning a shape vector 
with a reference shape vector according to the embodiment of the present 
invention; 

20 FIG. 5 is a flowchart illustrating a process of calculating a statistical 

facial shape model according to the embodiment of the present invention; 

FIG. 6 is a schematic diagram of a test face image according to the 
embodiment of the present invention; 



5 



FIG 7 is a schematic diagram of search ranges defined by initial 
positions of test feature points according to the embodiment of the present 
invention; 

FIG 8 is a flowchart illustrating a process labeling candidate feature 
points according to the embodiment of the present invention; 

FIG 9 is a flowchart of decision steps according to the embodiment 
of the present invention; and 

FIG 10 is a flowchart of decision steps according to another 
embodiment of the present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

Two embodiments are given in the following for purpose of better 
understanding. 

The statistical facial feature extraction method of the present 
invention essentially includes two phases: a training phase for creating a 
1 5 statistical face shape model based on a plurality of training face images; and 
a executing phase for extracting a plurality of facial features from a test face 
image. In this embodiment, each face image can be defined by six feature 
points located in different ranges, including four points at the internal and 
external comers of eyes and two points at the comers of mouth. Of course, 
20 other features such as nostrils, eyebrow and/or the like can be defined. 
These features may vary with different face poses, lighting conditions or 
facial expressions. Therefore, a template matching algorithm is used to find 
candidates of facial features. Required templates for facial features are 
constructed from a lot of training examples in the training phase. In 

6 



5 



10 



addition, a principal component analysis (PCA) technique is applied to gain 
further precise description on appearance and geometry variations of facial 
features. 

5 The training phase: 

With reference to the flowchart of FIG 1 , the primary purpose in the 
training phase is to create a statistical face shape model and local facial 
feature templates based on a plurality of training face images. Accordingly, 
N such as 100 or 1000 of training face images 1 shown in FIG. 2 are selected 

10 as training samples (step SI 01), preferably selecting frontal face images 
and using N as big as possible for creating more accurate model and 
templates. However, the number of training samples to be required depends 
on practical need. Next, the six feature points for each training face image 1 
are manually labeled (step SI 02) or automatically labeled by any known 

15 image extraction technique. As shown in FIG. 3, these feature points 
labeled on the training face image include coordinates (xi,yi), (x2,y2), (X3,y3) 
and (X4,y4) of the internal and external comers of eyes, and coordinates 
i^sjs) and (x6,y6) of the comers of mouth. Accordingly, a shape vector 
= (^ji'yji'—>Xj„,yj„) is defined, where in this embodiment, n=6, and xji 

20 equals to xi shown in FIG. 3, yji equal to yi, and so on. 

To reduce difference between training face images 1 due to face 
pose and expression variations, a 2D scaled rigid transform algorithm is 
applied to align each shape vector xj with a reference shape vector 
Xj =(x.,,y.,,...,Xj„,yjJ by means of scaling, 2D rotation and shift. The 



vector Xj can be one of the cited N shape vector xj or a self-defined vector 
corresponding to the cited feature point coordinates. 

With reference to FIG. 4, there is shown a flowchart of aligning a 
shape vector xj with a reference shape vector Xi in this embodiment. After 
5 the reference shape vector Xi and the shape vector Xj are selected (step S40 1 ), 
a squared Euclidean distance E between the vectors Xi and xj is computed 
(step S402) based on the following equation: 

E = (x. - M^^^(a,e)[x.] - t)^(x. - M^''^(a,e)[x.] - 1) (step S402), 
where M^^^(a,9)[Xj.] - 1 is a geometric transformation defining with a 
10 plurality of transfer parameters to align the shape vector xj. The transfer 
parameters include a rotating angle 0 , a scaling factor a , and a shifting 
vector represented by t = (t^,ty) . In addition, as 



M(a,0) = 



acos0 — asin0^ 



, M^^^(a,0) is a 2nx2n diagonal blocked 



asin0 acos0^ 

matrix, where each diagonal block is a 2x2 matrix M(a,0) , and 



15 M(a,0) 



'^acos0x. -asin0y. ^ 

where 1 ^ k ^ n. Next, E is 



^asin0Xj^ +acos0yj^^y 

minimized as the equation: 

E = (X, - M^^^(a.,0.)[x.] - t.yix, - M<^>(a.,0.)[x .] - 1.) , 
such that the parameters of angle 0^, factor a., and vector represented by 
t. r= (t^.,ty-) are found and used to align the shape vector (step S403). 
20 After the N shape vectors xj in this embodiment are all aligned with 

the reference shape vectors Xj (step S404), a least square algorithm is used 
to minimize the sum of squared Euclidean distance between the vectors Xj 



and Xi (step S405). The least square algorithm for the above minimization 
leads to solving the following linear system: 
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where n is the number of landmark points of each shape and, 
5 XI = Ex^ ,Y1 = Ey,, ,X2 = Zx^, , Y2 = Zyj. , 

k«l k=l k=l k=l . 

Z = ExV+y'jk ,Cl = ZXi,Xj, +y,,y.,,and CI = i y^^x., + x^y., . 

k=l k=1 k=l 

Therefore, the transformation parameters are obtained by solving 
the above linear system. If the above computation results in a value smaller 
than a predetermined threshold (step S406), the aligning step is finished, 
10 otherwise, a mean value of feature points of aligned shape vectors for each 

- 1 N 

block is computed to define a mean shape vector as x = — X (step S407), 

where Xa is aligned shape vector. After the mean shape vector x is assigned 
as the reference shape vector Xi and all aligned shape vectors Xa are assigned 
as the shape vectors xj (step S408), go to step S402 until the process 
1 5 converges. 

It is noted that the reference shape vector Xi assigned when the 
aligning step is performed at first time preferably corresponds to a 
non-inclined face image for reducing system load and operation process. 



However, inclined face images are also available because a mean shape 
vector is regarded as the reference shape vector since the aligning step is 
performed at second time (equivalent to steps S402-S408 of FIG 4). 
Namely, the mean shape vector is regarded as the reference shape vector for 
5 gradually aligning the difference among the shape vectors xj to convergence. 
Briefly, major function of performing the aligning step at first time is that 
all scaling shape vectors xj are aligned to be alike to each other, thereby 
gradually modifying results at sequential aligning steps on performance 
until the process converges. 

10 After all shape vectors xj are aligned with the reference shape 

vectors Xi assigned, a principal component analysis (PCA) technique is used 
to compute a plurality of principal components and further form a statistical 
face shape model (step SI 04) according to aligned shape vectors Xa, 
wherein the statistical face shape model is a point distribution model (PDM) 

15 and represents the shape vectors xj, with conjunction to a plurality of 
projection coefficients. 

For a step of computing the statistical face shape model, refer to the 
flowchart of FIG. 5. As shown in FIG. 5, a mean value of feature points of 
aligned shape vectors is computed to define a mean shape vector as 

-In - 

20 x = — (step S501). Next, the result d^ =x^-x obtained by 

subtracting the mean shape vector x from each aligned shape vector Xa 
forms a matrix A = [d^^ jd^^ '"-'^xn J (^*^P S502). Next, the covariance 

matrix C of matrix A is computed to find the equation C = AA^ (step S503). 

10 



Next, the plurality of principal components are computed according to 
eigenvectors derived from the equation Cv^=X'^v^ with eigenvalues 
corresponding to the covariance matrix C, to form the statistical face shaoe 
model (step S504), wherein X\ represents eigenvalues of the covariance 
5 matrix C, v'^ represents eigenvectors of the covariance matrix C, and 1 
^ m, where m is the dimension of the covariance matrix C for 
X\>X\>...>X^. 

Further, in this embodiment, each shape vector xj consists of six (i.e. 
n=6) feature vectors sj located in different blocks, so an average value. 

In 

10 evaluated by the equation t = — ZS: , of feature vectors s, corresponding to 

Nj=> ^ 

special blocks of all shape vector xj is defined as a feature template. 

When the cited steps in the training phase are performed, the 
statistical face shape model and the feature templates are created for facial 
feature extraction in a following executing phase. 

15 

The executing phase (feature extracting phase): 

Refer to the flowchart of FIG. 1 and a schematic diagram of test face 
image 2 of FIG. 6. After the test face image 2 is selected (step SI 05), the 
mean shape vectors x obtained in the training phase are regarded as initial 
20 positions of test feature points of the test face image 2 (step SI 06). It is 
noted that scaling of an initial test shape formed by the test feature points is 
preferably aligned similarly to the test face image 2. Based on each initial 
position, six search ranges are respectively defined in the test face image 2 



(step SI 07), wherein the sizes of search ranges can vary with different test 
face images 2. Refer to FIG 7, in which search ranges respectively 
corresponding to a different block (i.e., one of comers of eyes and mouth) 
are shown. That is, assume that actual feature points of the test face image 2 
5 are respectively located in the search ranges. 

An actual feature point of the test face image 2 may be located in the 
search ranges at any coordinate value. Therefore, a more precise candidate 
feature point is defined in the search ranges (step S108). With integrable 
reference to the flowchart of FIG 8, a plurality of reference points derived 

k 

10 by L = t + X bj.p j , are respectively labeled in each search range (step S80 1 ), 

where t is the feature template of block corresponding to a search range, pj 
is j-th^ principal component of the statistical face shape model computed 
from the training feature vectors, and bj is associated projection coefficient. 
Next, an error value between a reference point and the corresponding 
15 principal component pj and projection coefficient bj is computed as 

k 

s =11 li - t-Zb.p. II2 (step S802). Finally, k smallest error values are 
j=i 

selected to define as candidate feature points of the search range (step 
S803). 

Therefore, all combinations for candidate feature points located in 
20 different ranges are done to form k" test shape vectors (step SI 09). In this 
embodiment, n represents the number of feature points, for example, in this 
case, n=6. If two of the six feature points have smaller error values and are 



extracted, 2^(=64) different combinations of test shape vectors are obtained. 
All test shape vectors are respectively matched with the mean value of 
aligned shape vector Xa and the principal component of statistical face shape 
model to compute a similarity (step SllO). As a result, one candidate 
feature point corresponding to the test shape vector with the best similarity 
is assigned as facial feature of the test face image 2 (step Sill). 

This embodiment is based on the decision flowchart of FIG. 9 to 
find facial features of the test face image 2. After an approximate value of 



test shape vector is represented as x = x + IJbj p- by a mean shape vector x 

10 and the principal components of the statistical face shape model (step 
SAOl), a 2D scaled rigid transform algorithm aligns test shape vector using 

the equation x = M(a,9) x + XbiP^^j + t (step SA02), where 9, a and t 

are a rotating angle, a scaling factor and a shifting vector respectively. Next, 
a normaUzed distance for aligned test shape vectors aligned at step SA02 is 



1 5 computed by d(x) = 



— I (step SA03). The normalized distance d(x) 



is considered as the criterion to determine which combination of candidate 
feature points is the most similar to a face shape. Therefore, one candidate 
feature point corresponding to one, having the smallest normalized distance, 
of the aligned test shape vectors is assigned as facial feature of the test face 
20 image (step SA04). 

In addition, the invention also provides another embodiment of 

13 



decision flow to find facial features of the test face image 2. With reference 
to FIG 10, steps SBOl and SB02 are the same as steps SAOl and SA02 of 
FIG 9. but step SB03 in this embodiment computes an error value between 
a test shape vector and corresponding mean shape vector x as follows. 

e(x) = w.iii i,(x) - - ibip; II, +w,d(x). 

i=i j=i 



6 k . - 



where ZI|Ii(x)"-ti -SbjP'lL is a similarity of the test shape vector to 

corresponding aligned shape vector Xa, and d(x) is the normalized distance 
of Xa. The cited error value equation can be also rewritten as 



8(x) = w, 



z 



-f WjdCx), based on the error value equation used 



10 by step S802. Finally, one candidate feature point corresponding to one, 
having the shortest error value, of the test shape vectors is assigned as facial 
feature of the test face image (step SB04??). 

As cited above, the invention applies the principal component 
analysis (PCA) technique to more precisely describe appearance and 

15 geometric variances of facial features and further extracts entire facial 
features by combining statistical data of geometric and photometric 
properties on appearance obtained in the training phase. Thus, the problem 
that only extracts facial feature of a single portion in the prior art is 
improved. In addition, the invention does not need a proper initial guess 

20 value because only candidate feature positions (shapes) are required to be 
found in candidate search ranges of each facial feature, as based on face 

14 



images completely detected by a face detection algorithm, thereby reducing 
system load. 

Although the present invention has been explained in relation to its 
preferred embodiment, it is to be understood that many other possible 
5 modifications and variations can be made without departing from the spirit 
and scope of the invention as hereinafter claimed. 
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